Enhancing audio speech using visual speech features
نویسندگان
چکیده
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noise model to be updated. Analysis shows estimation of speech and noise statistics to be effective with human listening tests measuring the effectiveness of the resulting Wiener filter.
منابع مشابه
Audio and Visual Processing to Enhance User Interfaces
SUMMARY This report details the work carried out between the months of October 1996 and June 1997, and the results so far achieved. Numerous speech and person recognition experiments have been performed using both speech and visual lip features. The discriminatory properties of audio and visual features are examined, along with the performance of two classiiers, namely VQ and DTW. The eeect of ...
متن کاملRecognition of isolated words using Zernike and MFCC features for audio visual speech recognition
Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition ...
متن کاملA system for audio-visual speech recognition
In this work, a system of audio visual speech recognition will be presented. A new hybrid visual feature combination, which is suitable for audio -visual speech recognition was implemented. The features comprise both the shape and the appearance of lips, the dimensional reduction is applied using discrete cosine transform (DCT). A large visual speech database of the German language has been ass...
متن کاملMulti-Modal Hybrid Deep Neural Network for Speech Enhancement
Deep Neural Networks (DNN) have been successful in enhancing noisy speech signals. Enhancement is achieved by learning a nonlinear mapping function from the features of the corrupted speech signal to that of the reference clean speech signal. The quality of predicted features can be improved by providing additional side channel information that is robust to noise, such as visual cues. In this p...
متن کاملAudio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are use...
متن کامل